41 research outputs found

    Recursive model for dose-time responses in pharmacological studies

    Get PDF
    Background: Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors at each dose without capturing the evolution of time curves across dosage Results: In this article, we propose a parametric model for dose-time responses that follows Gompertz law in time and Hill equation across dose approximately. We derive a recursion relation for dose-response curves over time capturing the temporal evolution and then specify a regression model connecting the parameters controlling the dose-time response with individual level proteomic data. The resultant joint model allows us to predict the dose-response curves over time for new individuals. Conclusion: We have compared the efficacy of our proposed Recursive Hybrid model with individual dose-response predictive models at desired time points. We note that our proposed model exhibits a superior performance compared to the individual ones for both synthetic data and actual pharamcological data. For the desired dose-time varying genetic characterization and drug response values, we have used the HMS-LINCS database and demonstrated the effectiveness of our model for all available anticancer compounds

    S1: Supplementary Information for Article: A copula based approach for design of multivariate random forests for drug sensitivity prediction

    Get PDF
    Changes in performance with prior feature selection Random forest (RF) is designed to create uncorrelated trees using random subsets of features in each node of each tree. RF by itself is a great tool for feature selection from a high dimensional set of features. But we observed that the prediction accuracy is improved when a prior feature selection (RELIEFF) [1] approach is implemented. Table A shows the performance of RF, VMRF and CMRF with and without RELIEFF feature selection in 2 drug sets of GDSC. Performance Analysis for drugsets consisting of more 8 than two drugs We have generated empirical copulas for the bivariate cases as they are able to capture all forms of dependency structures. However, generation of empirical copulas has high computational complexity along with the need for a significant number of training samples at each node. Thus for more than two drug responses, we have considered parametric copulas and the difference between Gaussian copula parameters generated using root node and split node samples instead of the integral difference between empirical copulas is used. To test our hypothesis that VMRF and CMRF will perform better than RF, we considered a drug set with 4 different drugs from CCLE with single common target between them and a drug set with 3 different drugs in GDSC with a common target between them. The CCLE set has 482 cell lines and the GDSC set has 308 cell lines. RELIEFF was used to reduce the feature space prior to random forest application. For simplicity, in this case, we’ve used 30% of the sample cell lines as training data and 70% of them as testing data

    A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

    Get PDF
    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illus- trate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database

    Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction- 2016

    Get PDF
    Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity predic- tion problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error

    A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

    Get PDF
    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illus- trate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database

    Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks

    Get PDF
    Deep learning with Convolutional Neural Networks has shown great promise in image-based classification and enhancement but is often unsuitable for predictive modeling using features without spatial correlations. We present a feature representation approach termed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) to arrange high-dimensional vectors in a compact image form conducible for CNN-based deep learning. We consider the similarities between features to generate a concise feature map in the form of a two-dimensional image by minimizing the pairwise distance values following a Bayesian Metric Multidimensional Scaling Approach. We hypothesize that this approach enables embedded feature extraction and, integrated with CNN-based deep learning, can boost the predictive accuracy. We illustrate the superior predictive capabilities of the proposed fra- mework as compared to state-of-the-art methodologies in drug sensitivity prediction scenarios using synthetic datasets, drug chemical descriptors as predictors from NCI60, and both transcriptomic information and drug descriptors as predictors from GDSC

    Investigation of model stacking for drug sensitivity prediction

    Get PDF
    Background: A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types. Results: We explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squarred error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing squared error and inherent bias of random forests in prediction of outliers. The framework is tested on a setup including gene expression, drug target, physical properties and drug response information for a set of drugs and cell lines. Coclusion: The performance of individual and stacked models are compared. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominat eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues

    Recursive model for dose-time responses in pharmacological studies

    Get PDF
    Background: Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors at each dose without capturing the evolution of time curves across dosage Results: In this article, we propose a parametric model for dose-time responses that follows Gompertz law in time and Hill equation across dose approximately. We derive a recursion relation for dose-response curves over time capturing the temporal evolution and then specify a regression model connecting the parameters controlling the dose-time response with individual level proteomic data. The resultant joint model allows us to predict the dose-response curves over time for new individuals. Conclusion: We have compared the efficacy of our proposed Recursive Hybrid model with individual dose-response predictive models at desired time points. We note that our proposed model exhibits a superior performance compared to the individual ones for both synthetic data and actual pharamcological data. For the desired dose-time varying genetic characterization and drug response values, we have used the HMS-LINCS database and demonstrated the effectiveness of our model for all available anticancer compounds

    Application of transfer learning for cancer drug sensitivity prediction

    Get PDF
    Background: In precision medicine, scarcity of suitable biological data often hinders the design of an appropriate predictive model. In this regard, large scale pharmacogenomics studies, like CCLE and GDSC hold the promise to mitigate the issue. However, one cannot directly employ data from multiple sources together due to the existing distribution shift in data. One way to solve this problem is to utilize the transfer learning methodologies tailored to fit in this specific context. Results: In this paper, we present two novel approaches for incorporating information from a secondary database for improving the prediction in a target database. The first approach is based on latent variable cost optimization and the second approach considers polynomial mapping between the databases. Utilizing CCLE and GDSC databases, we illustrate that the proposed approaches accomplish a better prediction of drug sensitivities for different scenarios as comapred to the existing approaches. Conclusion: We have comapred the performance of the proposed predictive models with database-specifc individual models as well as existing transfer learning approaches. We note that our proposed approaches exhibit superior performance compared to the abovementioned alternative techniques for predicting senstivity for different anti-cancer compound, particularly the nonlinear mapping model shows the best overall performance

    A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

    Get PDF
    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illus- trate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database
    corecore